Linguistically Motivated Reordering Modeling for Phrase-Based Statistical Machine Translation
نویسنده
چکیده
Word reordering is one of the most difficult aspects of Statistical Machine Translation (SMT), and an important factor of its quality and efficiency. While short and mediumrange reordering is reasonably handled by the phrase-based approach (PSMT), long-range reordering still represents a challenge for state-of-the-art PSMT systems. As a major cause of this problem, we point out the inadequacy of existing reordering constraints and models to cope with the reordering phenomena occurring between distant languages. On one hand, the reordering constraints used to control translation complexity appear to be too coarse-grained. On the other hand, the reordering models used to score different reordering decisions during translation are not discriminative enough to effectively guide the search over very large sets of hypotheses. In this thesis we propose several techniques to improve the definition of the reordering search space in PSMT by exploiting prior linguistic knowledge, so that long-range reordering may be adequately handled without sacrificing efficiency. In particular, we focus on Arabic-English and German-English: two language pairs characterized by uneven distributions of reordering phenomena, with long-range movements concentrating on few patterns. Through extensive experiments, we show that our techniques can significantly advance the state of the art in PSMT for these challenging language pairs. When compared with a popoular tree-based SMT approach, our best PSMT systems achieve comparable or higher reordering accuracies while being considerably faster.
منابع مشابه
Linguistically Annotated BTG for Statistical Machine Translation
Bracketing Transduction Grammar (BTG) is a natural choice for effective integration of desired linguistic knowledge into statistical machine translation (SMT). In this paper, we propose a Linguistically Annotated BTG (LABTG) for SMT. It conveys linguistic knowledge of source-side syntax structures to BTG hierarchical structures through linguistic annotation. From the linguistically annotated da...
متن کاملHead Finalization Reordering for Chinese-to-Japanese Machine Translation
In Statistical Machine Translation, reordering rules have proved useful in extracting bilingual phrases and in decoding during translation between languages that are structurally different. Linguistically motivated rules have been incorporated into Chineseto-English (Wang et al., 2007) and Englishto-Japanese (Isozaki et al., 2010b) translation with significant gains to the statistical translati...
متن کاملSyntactic Phrase Reordering for English-to-Arabic Statistical Machine Translation
Syntactic Reordering of the source language to better match the phrase structure of the target language has been shown to improve the performance of phrase-based Statistical Machine Translation. This paper applies syntactic reordering to English-to-Arabic translation. It introduces reordering rules, and motivates them linguistically. It also studies the effect of combining reordering with Arabi...
متن کاملLinguistically Annotated Reordering: Evaluation and Analysis
Linguistic knowledge plays an important role on phrase movement in statistical machine translation. To efficiently incorporate linguistic knowledge into phrase reordering, we propose a new approach: Linguistically Annotated Reordering (LAR). In LAR, we build hard hierarchical skeletons and inject soft linguistic knowledge from source parse trees to nodes of hard skeletons during translation. Th...
متن کاملContext-free reordering, finite-state translation
We describe a class of translation model in which a set of input variants encoded as a context-free forest is translated using a finitestate translation model. The forest structure of the input is well-suited to representing word order alternatives, making it straightforward to model translation as a two step process: (1) tree-based source reordering and (2) phrase transduction. By treating the...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013